Introduction

The deer tick or blacklegged tick (Ixodes scapularis) is the most medically relevant species of hard tick that in New York State (NYS). I. scapularis populations in NYS may carry a plethora of pathogens, including but not limited to:

  • Borrelia burgdorferi - the causitive agent of Lyme disease
  • Anaplasma phagocytophilum - the causitive agent of human granulocytic anaplasmosis
  • Babesia microti - the causitive agent of babesiosis in humans
  • Borrelia miyamotoi - causative of a relapsing tick fever

The prevalence of the above pathogens varies both spatially and temporally in NYS, which results in varying risk for NYS citizens. Because these pathogens are not transmitted from tick to tick transovarially, they are required to be maintained throughout populations of animal hosts in the various ecosystems of NYS. Data suggests that certain land use and land cover types are beneficial for populations of animals which maintain these pathogens, and potentially for the pathogen’s themselves. This project aims to explore any trends between pathogen prevalence within I. scapularis at the county level and land use / land cover types in NYS.


Materials and methods


Methods - Software information

All data processing and analysis were preformed using RStudio with the following version specifications and packages:

## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS  10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] sjstats_0.18.0     pscl_1.5.5         sjlabelled_1.1.7   sjmisc_2.8.5      
##  [5] sjPlot_2.8.6       glue_1.4.2         tmap_3.2           rgeos_0.5-5       
##  [9] rgdal_1.5-18       gdalUtils_2.0.3.2  leaflet_2.0.3      spData_0.3.8      
## [13] sf_0.9-6           raster_3.3-13      sp_1.4-4           FedData_3.0.0.9000
## [17] forcats_0.5.0      stringr_1.4.0      dplyr_1.0.2        purrr_0.3.4       
## [21] tidyr_1.1.2        tibble_3.0.4       ggplot2_3.3.2      tidyverse_1.3.0   
## [25] readr_1.4.0       
## 
## loaded via a namespace (and not attached):
##   [1] minqa_1.2.4             TH.data_1.0-10          leafem_0.1.3           
##   [4] colorspace_1.4-1        ellipsis_0.3.1          class_7.3-17           
##   [7] estimability_1.3        parameters_0.9.0        base64enc_0.1-3        
##  [10] fs_1.5.0                dichromat_2.0-0         rstudioapi_0.11        
##  [13] mvtnorm_1.1-1           fansi_0.4.1             lubridate_1.7.9        
##  [16] xml2_1.3.2              splines_4.0.2           codetools_0.2-16       
##  [19] R.methodsS3_1.8.1       knitr_1.30              jsonlite_1.7.1         
##  [22] nloptr_1.2.2.2          ggeffects_0.16.0        tmaptools_3.1          
##  [25] broom_0.7.1             dbplyr_1.4.4            png_0.1-7              
##  [28] R.oo_1.24.0             effectsize_0.4.0        compiler_4.0.2         
##  [31] httr_1.4.2              emmeans_1.5.1           backports_1.1.10       
##  [34] Matrix_1.2-18           assertthat_0.2.1        cli_2.1.0              
##  [37] leaflet.providers_1.9.0 htmltools_0.5.0         tools_4.0.2            
##  [40] coda_0.19-4             gtable_0.3.0            Rcpp_1.0.5             
##  [43] cellranger_1.1.0        vctrs_0.3.4             nlme_3.1-149           
##  [46] iterators_1.0.13        leafsync_0.1.0          crosstalk_1.1.0.1      
##  [49] insight_0.10.0          lwgeom_0.2-5            xfun_0.18              
##  [52] lme4_1.1-23             rvest_0.3.6             lifecycle_0.2.0        
##  [55] statmod_1.4.34          XML_3.99-0.5            zoo_1.8-8              
##  [58] MASS_7.3-53             scales_1.1.1            hms_0.5.3              
##  [61] sandwich_3.0-0          parallel_4.0.2          RColorBrewer_1.1-2     
##  [64] yaml_2.2.1              stringi_1.5.3           bayestestR_0.7.5       
##  [67] foreach_1.5.1           spDataLarge_0.4.1       e1071_1.7-4            
##  [70] boot_1.3-25             rlang_0.4.8             pkgconfig_2.0.3        
##  [73] evaluate_0.14           lattice_0.20-41         htmlwidgets_1.5.2      
##  [76] tidyselect_1.1.0        magrittr_1.5            R6_2.5.0               
##  [79] generics_0.1.0          multcomp_1.4-14         DBI_1.1.0              
##  [82] pillar_1.4.6            haven_2.3.1             withr_2.3.0            
##  [85] units_0.6-7             stars_0.4-3             survival_3.2-7         
##  [88] abind_1.4-5             performance_0.5.1       modelr_0.1.8           
##  [91] crayon_1.3.4            KernSmooth_2.23-17      rmarkdown_2.4          
##  [94] grid_4.0.2              readxl_1.3.1            data.table_1.13.2      
##  [97] blob_1.2.1              reprex_0.3.0            digest_0.6.27          
## [100] classInt_0.4-3          xtable_1.8-4            R.utils_2.10.1         
## [103] munsell_0.5.0           viridisLite_0.3.0


Methods - Data Gathering

The data for this project are gathered from three main sources. The first source is the New York State Department of Health (NYSDOH) Vector Ecology Laboratory’s tick borne pathogen surveillance dataset. This dataset contains tick field surveillance data throughout NYS from 2008 - present, as well as corresponding PCR testing data for the four pathogens mentioned in the introduction. These data are spatially aggregated to the county level and are available online at the following links for Nymphs and Adults: These data were gathered into R using the following code:

tdir=tempdir()
adulturl <- "https://health.data.ny.gov/api/views/vzbp-i2d4/rows.csv?accessType=DOWNLOAD&bom=true&query=select+*"
nymphurl <- "https://health.data.ny.gov/api/views/kibp-u2ip/rows.csv?accessType=DOWNLOAD&bom=true&query=select+*"
if(file.exists(paste(tdir,"/Adultdata.csv",sep=""))==FALSE){
download.file(adulturl, destfile = file.path(tdir,"/Adultdata.csv"))}
if(file.exists(paste(tdir,"/Nymphdata.csv",sep=""))==FALSE){
download.file(nymphurl,destfile = file.path(tdir,"/Nymphdata.csv"))}
Nymphdata <- read_csv(paste(tdir,"/Nymphdata.csv",sep=""))
Adultdata <- read_csv(paste(tdir,"/Adultdata.csv",sep=""))

The second data source is county-level shapefiles from NYS GIS clearinghouse, available at the following link: NYS GIS Clearinghouse. These data were read into R using the following code:

shpurl="http://gis.ny.gov/gisdata/fileserver/?DSID=927&file=NYS_Civil_Boundaries.shp.zip"
if(file.exists(paste(tdir,"/NYS_Civil_Boundaries_SHP/Counties_Shoreline.shp",sep=""))==FALSE){
download.file(shpurl, destfile = file.path(tdir,"Boundaries.zip"))
unzip(file.path(tdir,"Boundaries.zip"),exdir = tdir)}
Cnt=read_sf(paste(tdir,"/NYS_Civil_Boundaries_SHP/Counties_Shoreline.shp",sep=""))
County_Boundaries = Cnt[,c(1,18)]
NYS = st_union(County_Boundaries)
simpSpNYS=st_simplify(NYS,dTolerance=100)
SpNYS=as(NYS,Class="Spatial")

The third and final data source is land use and land cover data from the 2016 National Land Cover Dataset, available at the following link: NLCD. These data were downloaded using R-specific NLCD calls. The call was looped to iterate the data download for each county, which would then overwrite the previous county’s data to conserve storage. The analysis was also grouped into this code, which will be explained later.

newcounties=County_Boundaries %>% filter(County != "New York",
                                         County != "Queens",
                                         County != "Kings",
                                         County != "Richmond",
                                         County != "Bronx")
st_transform(newcounties,crs="+proj=utm +zone=18 +datum=NAD83 +units=m +no_defs")
df1=NULL

for(i in 1:nrow(newcounties)){
        NLCD=get_nlcd(template = st_as_sf(newcounties[i,]),
                      year = 2016,
                      label="Same",
                      dataset='Land_Cover',
                      landmass="L48",
                      extraction.dir = tdir,
                      force.redo = T)
        NLCD2=projectRaster(from=NLCD,crs="+proj=utm +zone=18 +datum=NAD83 +units=m +no_defs")
        cropnew=crop(NLCD2,extent(newcounties[i,]))
        masknew=mask(cropnew,newcounties[i,])
        masknew2= masknew %in% c(41,43)
        masknew3=masknew/masknew
        df=data.frame(County=paste(newcounties[[i,1]]),
                      Total_Pixels=cellStats(masknew3,'sum'),
                      Total_Forest_Pixels=cellStats(masknew2,stat='sum'),
                      Percent_Forest_Cover=cellStats(masknew2,stat='sum')/cellStats(masknew3,'sum'))
        if(is.null(df1)==TRUE){df1=df}
        else(df1=rbind(df1,df))}


Methods - Data visualization

In order to visualize pathogen prevalence throughout NYS, the ggplot and leaflet packages were used to create time-series graphs and interactive maps. The code to create the NYS average pathogen prevalence time-series graph is shown below:

plot1=ggplot(data = All_groupings,aes(x=Year,y=percpos,color=Pathogen)) +
        geom_line() +
        ggtitle(expression(paste("Time Series Graph of ", italic("Ixodes scapularis")," Infectivity Rates (NYS)",sep=""))) +
        theme(axis.text.x = element_text(angle=65,vjust=.65)) +
        geom_hline(yintercept = 0,linetype='solid',color='black',size=.5,alpha=.7) +
        geom_vline(xintercept = 2008,linetype='solid',color='black',size=.5,alpha=.7) +
        facet_wrap(~`Life Stage`) +
        ylab("Pathogen Prevalence (%)")+
        scale_x_continuous(breaks=c(2008:2019))

The code to create an interactive map where each clicking each county displays a time-series graph comparing county-level pathogen rates to the NYS average is shown below:

County_Boundaries=st_transform(County_Boundaries,CRS("+proj=longlat +datum=WGS84"))
names(County_Boundaries)[1]="County"
test_new=left_join(County_Boundaries,test2,by='County')
plot2=leaflet() %>%
        addTiles() %>%
        addPolygons(data=test_new,
                    group='County',
                    weight=1,
                    label=~County,
                    highlight=highlightOptions(
                        weight=5,
                        color='red',
                        bringToFront=TRUE)) %>%
        leafpop::addPopupGraphs(graph=test_new$plot,
                                group='County',
                                width=600,height=300)


Methods - Raster Analysis

NLCD Raster data were used to determine the percent of deciduous and mixed (deciduous/coniferous) forest that covered each county. The code to calculate this data was shown earlier, but a detailed explanation (which ignores the loop) is shown below.

Data for each county is downloaded via this code:

        NLCD=get_nlcd(template = st_as_sf(newcounties[i,]),
                      year = 2016,
                      label="Same",
                      dataset='Land_Cover',
                      landmass="L48",
                      extraction.dir = tdir,
                      force.redo = T)

The downloaded raster is then projected on to WGS84:

NLCD2=projectRaster(from=NLCD,crs='+proj=longlat +datum=WGS84')

The projected raster is then cropped to the extent of the county required:

cropnew=crop(NLCD2,extent(newcounties[i,]))

The cropped raster is then masked by the county required:

masknew=mask(cropnew,newcounties[i,])

The masked raster is then filtered to only include either deciduous or mixed forest pixels:

masknew2= masknew %in% c(41,43)

All pixel values of the original masked raster are then changed to a value of 1, in order to calculate the total number of pixels in the raster:

masknew3=masknew/masknew

A dataframe is then compiled which includes the name of the County, the total number of pixels within that county, the total number of mixed/deciduous forest pixels within the county, and the percent of pixels which are mixed/deciduous.

df=data.frame(County=paste(newcounties[[i,1]]),
    Total_Pixels=cellStats(masknew3,'sum'),
    Total_Forest_Pixels=cellStats(masknew2,stat='sum'),
    Percent_Forest_Cover=cellStats(masknew2,stat='sum')/cellStats(masknew3,'sum'))


Methods - Statistical Analysis

Following the collection of pathogen and raster data, each variable was examined for normality. The distribution of each variable determined the type of analysis used. For analysis, pathogen prevalence was always the dependent variable. If both the percent forest cover and pathogen prevalence variables were normal, a simple test for correlation was used. If the pathogen prevalence variable seemed to follow a negative binomial distribution, the zero-inflated negative binomial regression model was used.


Results


Results - NYS Average Rates Within I. scapularis Populations

Pathogen prevalence in NYS shows a general trend of increase over time. Average rates of each pathogen are consistent with expectations where B. burgdorferi shows the highest prevalence within I. scapularis populations in NYS. A. phagocytophilum , B. microti , and B. miyamotoi exhibit the second, third, and fourth highest prevalence, respectively. Additionally, prevalence of each pathogen is about half as much in nymphal ticks compared to adult ticks.


Results - County Average Rates Within I. scapularis Populations

General takeaways from the interactive county level map are the following:

  1. Rates in the capital district and metropolitan areas of NYS are generally higher than the NYS average, and also have more complete data.
  2. The north county (Adirondack region) has generally prevalence numbers (likely due to low numbers of tick populations).
  3. In counties with complete data, pathogen prevalence generally has risen over time.


Results - Forest Cover by County in New York State


Results - Statistical Analysis

The percent area of each county covered by deciduous or mixed forest appears normally distributed.

B. burgdorferi appears to be the only normally distributed pathogen data among adults. This likely occurs because it has the highest pathogen prevalence and is endemic throughout NYS. Prevalence among all other pathogens appears to be zero-inflated, likely because numbers are low. Lower numbers makes the pathogen data approach a Poisson/Negative Binomial distribution because it is count data.

Similar to adults, B. burgdorferi prevalence among nymphs near normal. All others approach Poisson/Negative Binomial distribution. Due to the normality of B. burgdorferi , Pearson correlation tests were used to examine the relationship between percent mixed/deciduous forest cover and B. burgdorferi prevalence.

Adult test is shown below:

## 
##  Pearson's product-moment correlation
## 
## data:  adulttestingdata$Percent_Forest_Cover and adulttestingdata$`B. burgdorferi (%)`
## t = -0.24767, df = 423, p-value = 0.8045
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.10703993  0.08317493
## sample estimates:
##         cor 
## -0.01204144

The p-value from the test of correlation between county-level percent forest cover and adult B. burgdorferi prevalence fails to reject the null hypothesis, indicating there is no correlation in this sample.

Nymph test is shown below:

## 
##  Pearson's product-moment correlation
## 
## data:  nymphtestingdata$Percent_Forest_Cover and nymphtestingdata$`B. burgdorferi (%)`
## t = 0.32964, df = 372, p-value = 0.7419
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.08446414  0.11829025
## sample estimates:
##        cor 
## 0.01708873

Similar to adults, the p-value from the test of correlation between county-level percent forest cover and nymphal B. burgdorferi prevalence fails to reject the null hypothesis, indicating there is no correlation in this sample.

To examine the relationship between percent mixed/deciduous forest cover and A. phagocytophilum , B. microti , and B. miyamotoi prevalence, zero-inflated negative binomial (ZINB) models were used. The dependent variable for each count and zero-inflated model was the count of a particular pathogen, where the independent variable was percent mixed/deciduous forest cover. Code to create each model for both adults and nymphs is shown below:

adult_ANA_reg=zeroinfl(formula = ANApos ~ Percent_Forest_Cover |  Percent_Forest_Cover,
                 data=adulttestingdata,dist='negbin')
adult_Bmic_reg=zeroinfl(formula = Bmicpos ~ Percent_Forest_Cover |  Percent_Forest_Cover,
                 data=adulttestingdata,dist='negbin')
adult_Bmiya_reg=zeroinfl(formula = Bmiyapos ~ Percent_Forest_Cover |  Percent_Forest_Cover,
                 data=adulttestingdata,dist='negbin')
nymph_ANA_reg=zeroinfl(formula = ANApos ~ Percent_Forest_Cover |  Percent_Forest_Cover,
                 data=nymphtestingdata,dist='negbin')
nymph_Bmic_reg=zeroinfl(formula = Bmicpos ~ Percent_Forest_Cover |  Percent_Forest_Cover,
                 data=nymphtestingdata,dist='negbin')
nymph_Bmiya_reg=zeroinfl(formula = Bmiyapos ~ Percent_Forest_Cover |  Percent_Forest_Cover,
                 data=nymphtestingdata,dist='negbin')


Significance for adult ZINB models are shown below:
Adult Regression Results
  A. phagocytophilum B. microti B. miyamotoi
Predictors p p p
(Intercept) <0.001 <0.001 0.001
Percent_Forest_Cover 0.266 0.135 0.005
Zero-Inflated Model
(Intercept) 0.093 0.165 0.225
Percent_Forest_Cover 0.055 0.141 0.139
Observations 425 425 231


Coefficient of adult B. miyamotoi model shown below:

adult_Bmiya_reg$coefficients$count
##          (Intercept) Percent_Forest_Cover 
##            0.9398944           -2.3622950


Significance for nymphal ZINB models are shown below:
Nymphal Regression Results
  A. phagocytophilum B. microti B. miyamotoi
Predictors p p p
(Intercept) <0.001 <0.001 0.141
Percent_Forest_Cover 0.183 <0.001 0.002
Zero-Inflated Model
(Intercept) 0.052 0.183 NaN
Percent_Forest_Cover 0.024 0.188 NaN
Observations 374 374 222


Coefficient of statistically significant nymphal models shown below:


Nymphal A. phagocyotphilum zero-inflated model:

nymph_ANA_reg$coefficients$zero
##          (Intercept) Percent_Forest_Cover 
##             5.398699           -73.873538


Nymphal B. microti count model:

nymph_Bmic_reg$coefficients$count
##          (Intercept) Percent_Forest_Cover 
##             1.986349            -3.640456


Nymphal B. miyamotoi count model:

nymph_Bmiya_reg$coefficients$count
##          (Intercept) Percent_Forest_Cover 
##             0.510452            -3.406799

Conclusions

Tickborne pathogen prevalence in NYS is an ever-increasing problem. NYSDOH Vector Ecology Laboratory tick sampling indicates that pathogen prevalence for B. burgdorferi , A. phagocytophilum , B. microti , and B. miyamotoi have been increasing over time. Increases in pathogen prevalence provides a warning for increased risk of tickborne disease for NYS citizens. Although the prevalence of each pathogen, and thus, risk for disease differs by county, the general trend indicates that tickborne disease will be an issue for years to come in NYS. The interactive map in this project provides a user-friendly ability to navigate publicly available data to determine the potential risk for tickborne pathogen exposure in your county.

One extremely important factor in determining risk for tickborne pathogen exposure is tick habitat.< I. scapularis prefer deciduous and mixed forest ecosystems, making tick prevention methods in these ecosystems an important topic for preventing tickborne disease in the future. This analysis aimed to determine if the percent of a county covered by these types of forest plays any role in pathogen prevalence rates. The method behind this thinking is that the more available area for tick populations to expand dually provides tickborne pathogens the ability to expand into these areas. Coincidentally, preliminary results from the correlation tests and ZINB modeling show that the percent of a county covered by mixed or deciduous forest is either not directly related to tickborne pathogen prevalence, or actually shows a decreased risk for pathogen prevalence. There are two likely explanations for these results. The first is that aggregating forest cover to large spatial areas (counties in this study), do not represent the microvariations in tick habitat that allow populations to proliferate. Smaller spatial scales may provide a more accurate representation of the relationship between land cover and pathogen prevalence. Another explanation is that there is sampling bias in these data. The NYSDOH is tasked with protecting the health of the public, and thus, sampling is generally conducted:

  1. Where the greatest risk for tick-human interaction is.
  2. Where the density of tick populations is high enough to get a good sample size.

The first bias would cause more data to be gathered around suburban areas due to the larger number of individuals at risk. Counties with more suburban areas will likely have a lower percentage of forest cover, thus biasing the data.

The second bias would cause more data to be gathered in areas where tick populations are large enough for the purpose of sampling. Often times, this occurs in areas with forest edge and nearby woodlots. These types of areas represent microecosystems perfect for tick population expansion, thus, making sampling easier for NYSDOH staff.

In summation, the two takeaways from this project are the following:

  1. Tick pathogen prevalence has been increasing over time in NYS.
  2. A more detailed analysis is required to examine the role forest cover plays with pathogen prevalence.


References

Will add later.